Presentation: Tweet"Apache Mahout's new DSL for Distributed Machine Learning on SPARK"
I will talk about software that connects two very complex worlds: machine learning and distributed data processing. Six years ago, the Apache Mahout project started out to build a library for scalable machine learning based on Hadoop's MapReduce paradigm.
I will look back to the learnings from my experience as developer on Mahout, w.r.t. to software engineering as well as running a community-driven open-source project. After that, I will talk about a major rewrite that is currently undergoing. Mahout will provide an easy-to-use, declarative Scala DSL for linear algebraic operations, and an optimizer that translates programs written in the DSL to modern data processing systems such as Apache Spark.
Download slides